Skip to content

Add a check in non-debug mode for a T_NONE class#573

Closed
jhawthorn wants to merge 188 commits into
masterfrom
call_t_none_assertion
Closed

Add a check in non-debug mode for a T_NONE class#573
jhawthorn wants to merge 188 commits into
masterfrom
call_t_none_assertion

Conversation

@jhawthorn

Copy link
Copy Markdown
Member

Although a crash could occur anywhere, one of the most common symptoms we see from getting a reference to a garbage collected object is crashing while attempting to call a method on it.

These crashes usually occur when trying to perform an rb_id_table_lookup in the "class" cc_tbl, where the class is usually another garbage collected object, because the freelist is stored in the class pointer. This of course isn't guaranteed to happen, as it only works for T_NONE.

This commit aims to have a better error message in this case, in hopes of having a better grouping of errors and to give a better hint to those investigating and triaging crashes.

Example output:

ptr = Fiddle.dlwrap(Object.new)
GC.start
Fiddle.dlunwrap(ptr).to_s
./test.rb:9: [BUG] attempted to search method 'to_s' on a garbage collected object
ruby 3.5.0dev (2025-05-05T16:29:43Z call_t_none_assert.. 7b419c752b) +PRISM [arm64-darwin24]

-- Crash Report log information --------------------------------------------

@samuel-williams-shopify

Copy link
Copy Markdown

Could we consider upstreaming this change? LGTM.

jhawthorn and others added 29 commits May 29, 2026 11:29
The "locale" encoding is only registered once, so we can use an atomic
to avoid the full lock and hash lookup.
* Reserve 2 bits for expressing object layout

We would like to make instance variable reads in the JIT compiler faster
(as well as simplify the JIT implementation).  Currently, in order to
read an instance variable, we have to:

1. Test for heap object
2. Load object to a 64 bit register
3. Mask the object header
4. Bit test against the masked header
5. JNE
6. Load field

We would like to:

1. Test for heap object
2. Load object shape to a 32 bit register
3. Bit test against the shape
4. JNE
5. Load field

The way we fetch instance variables is not consistent across objects.
In order to realize our goal, we need to encode object layout inside the
shape.  If we encode object layout inside the shape, then the shape
itself will guarantee that the access pattern generated by the JIT
compiler is correct.

We should encode the following load patterns into the shape tag bits.
This way we can share shapes on transitions, but be able to
differentiate the access patterns for the JIT compiler.  In other words,
two objects can have an `@a -> @b -> @c` transition and share the same
shape, but the tag bits can differentiate the access pattern so that the
JIT compiler can be confident that the machine code is correct.

Here are the patterns:

1. Embedded/Extended T_OBJECT Instance Variables

Objects with direct references to instance variables or via malloc
buffer

2. Objects with fields_objects fields

These are Data and TypedData objects.  They have an associated axillary
imemo/fields object that stores the instance variables.  The access
pattern is `object[2] + 2`.  The fields object is the 3rd field, and the
instance variables start at +2 inside the fields object.  The fields
object itself is a Ruby object, so it contains the usual header bits +
class headers.

3. Non Boxable Classes / Modules

This is similar to Objects with fields_objects, but the fields object is
stored at a different offset.  We’re differentiating this from boxable
classes and modules because those are harder to support.

4. Other

"Other" pattern is for objects that are rare, or have
difficult-to-implement access patterns.  This includes:

* Boxable classes and modules
* Structs (for now)
* Objects that use the geniv table

Proposed shape bit layout:

```
  Current shape_id_t is 32 bits:
  31        28 27 26 25 24 23 22        19 18                         0
  +-----------+--+--+--+--+--+------------+----------------------------+
  | unused    |L1|L0|OI|FR|CX| heap index | shape tree offset          |
  +-----------+--+--+--+--+--+------------+----------------------------+
               |  |  |  |  |  |            |
               |  |  |  |  |  |            +-- bits 0-18: SHAPE_ID_OFFSET_MASK
               |  |  |  |  |  +--------------- bits 19-22: SHAPE_ID_HEAP_INDEX_MASK
               |  |  |  |  +------------------ bit 23: SHAPE_ID_FL_COMPLEX
               |  |  |  +--------------------- bit 24: SHAPE_ID_FL_FROZEN
               |  |  +------------------------ bit 25: SHAPE_ID_FL_HAS_OBJECT_ID
               +--+--------------------------- bits 26-27: SHAPE_ID_LAYOUT_MASK
```

The important part about these layout patterns is that they do not
reflect the _type_ of object, only how the object is laid out in memory.
For example, we currently treat structs as "other", but we can refactor
them to have the same layout as "Objects with fields_objects", and when
we do that they should get a different bit in the shape header.

This commit only reserves the two bits, it doesn't use them in the JIT
compiler yet.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Max Bernstein <tekknolagi@gmail.com>

* Update gc.c

Co-authored-by: Nobuyoshi Nakada <nobu.nakada@gmail.com>

* Update shape.h

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

* fix function name

* Update shape.c

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

* fix function name

* Revert "Update shape.c"

This reverts commit 900711d.

* add comment

---------

Co-authored-by: John Hawthorn <john@hawthorn.email>
Co-authored-by: Max Bernstein <tekknolagi@gmail.com>
Co-authored-by: Nobuyoshi Nakada <nobu.nakada@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Replace RSA keys for intermediate_key and ee_key with RSA 4096-bit keys
rsa-1.pem and rsa-2.pem. At least RSA 2048-bit keys are required for signing
and encryption in FIPS.

SP 800-131A Rev. 2
* 3. Digital Signatures
* 6. Key Agreement and Key Transport Using RSA
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf
https://github.com/openssl/openssl/blob/71943544885ff364a10bcc5ffc62d0e651c9a021/providers/common/securitycheck.c#L72-L73

```
$ openssl rsa -in test/openssl/fixtures/pkey/rsa-1.pem -text -noout | head -1
Private-Key: (4096 bit, 2 primes)

$ openssl rsa -in test/openssl/fixtures/pkey/rsa-2.pem -text -noout | head -1
Private-Key: (4096 bit, 2 primes)
```

ruby/openssl@f130312442
Feeding a deeply nested constructed encoding to OpenSSL::ASN1.decode,
.decode_all, or .traverse can cause unbounded recursion and result in
SystemStackError.

Add an explicit nesting depth limit of 200 levels and raise
OpenSSL::ASN1::ASN1Error if it is exceeded. This limit is arbitrary and
currently not configurable, but should be sufficient for any practical
use cases.

Fixes https://hackerone.com/reports/3662125

ruby/openssl@fc753239cc
`"%" PRIsVALUE` just copies/appends the corresponding argument as a
string.
    gc/mmtk/mmtk.c:505:1: warning: function 'rb_mmtk_gc_thread_panic_handler' could be declared with attribute 'noreturn' [-Wmissing-noreturn]
      505 | {
          | ^
    gc/mmtk/mmtk.c:987:27: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
      987 |         char obj_info_buf[info_size];
          |                           ^~~~~~~~~
    gc/mmtk/mmtk.c:990:34: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]
      990 |         char parent_obj_info_buf[info_size];
          |                                  ^~~~~~~~~

ruby/mmtk@8ce21cdf10
Bumps the github-actions group with 1 update in the / directory: [taiki-e/install-action](https://github.com/taiki-e/install-action).


Updates `taiki-e/install-action` from 2.79.11 to 2.81.1
- [Release notes](https://github.com/taiki-e/install-action/releases)
- [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md)
- [Commits](taiki-e/install-action@13608cb...e49978b)

---
updated-dependencies:
- dependency-name: taiki-e/install-action
  dependency-version: 2.81.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
In hash_replace_ref, it currently evaluates rb_gc_location twice for the
key and value. We can reduce that to once each.
Document missing gemrc configuration keys. The documented configuration key
order aligns with the following part.

https://github.com/ruby/rubygems/blob/287175d4ae57340e22c6f1d21da1489d7dac35c5/lib/rubygems/config_file.rb#L240-L257

Add and update tests for missing gemrc configuration use_psych and gemhome key
assertions.

Fix test_initialize_global_gem_cache_gemrc and
test_initialize_concurrent_downloads to test with symbolic keys,
as global_gem_cache and use_psych keys are documented as a symbolic key.

ruby/rubygems@8baf4d49a4
(ruby/strscan#205)

See also: https://bugs.ruby-lang.org/issues/21943

This is semantically equivalent to `scanner[specifier]&.to_i(base)` but
this is faster than `scanner[specifier]&.to_i(base)` because
`integer_at` doesn't create a temporary String when possible.

This PR also includes a benchmark for them:

```console
$ ruby -v -S benchmark-driver benchmark/integer_at.yaml
ruby 4.1.0dev (2026-05-01T19:25:51Z master ruby/strscan@f2845eab29) +PRISM [x86_64-linux]
Warming up --------------------------------------
             [].to_i    24.272M i/s -     25.109M times in 1.034481s (41.20ns/i, 32clocks/i)
          integer_at    61.188M i/s -     62.491M times in 1.021289s (16.34ns/i, 62clocks/i)
Calculating -------------------------------------
             [].to_i    26.831M i/s -     72.816M times in 2.713883s (37.27ns/i, 169clocks/i)
          integer_at    81.331M i/s -    183.564M times in 2.256998s (12.30ns/i, 43clocks/i)

Comparison:
          integer_at:  81331225.5 i/s
             [].to_i:  26831046.3 i/s - 3.03x  slower
```

In this environment, `integer_at` is 3.03x faster than `[].to_i`.

ruby/strscan@8a60879b2d

Co-authored-by: jinroq <jinroq@gmail.com>
The earlier `rake vendor:compact_index` hook into `dev:deps` and the
hard-copy step in ruby-core.yml fell apart in ruby/ruby's test-bundler
runner, which sets TMPDIR per process and does not invoke our rake
tasks. Pull the fetch logic into `Spec::Rubygems.install_vendored_compact_index`
and call it from `install_test_deps` so every test setup path - local
`bin/rspec`, `bin/parallel_rspec`, GHA bundler.yml, and ruby/ruby's
test-bundler - lands the files at `Path.tmp_root.join("compact_index")`
exactly where the artifice already looks. The standalone rake task and
its workflow hop are no longer needed.

ruby/rubygems@d8536e115e

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… is set

Previously the install_vendored_compact_index short-circuit only checked
whether `tmp/compact_index/lib/compact_index.rb` existed, so once any ref
was vendored a subsequent `COMPACT_INDEX_REF=<sha> bin/rspec ...` kept
serving the stale copy. Drop the vendor tree first when the env var is
explicitly set so an override always re-fetches against the requested
ref.

ruby/rubygems@db5d06953f

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The vendored compact_index install ran without any coordination, so two
test setups starting at once could both write into tmp/compact_index/ at
the same time. The skip guard also checked a single file, which meant an
interrupted download leaving only lib/compact_index.rb behind would be
treated as a complete vendor tree on the next run.

Take an exclusive file lock around the install and only skip the download
once every expected file is present, removing the tree under the same lock
when COMPACT_INDEX_REF forces a refresh.

ruby/rubygems@0451700769

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rb_gc_modular_gc_loaded_p and rb_gc_active_gc_name are only used when
compiling with modular GC enabled.
Prism accepted a modifier conditional as the predicate of an `if`/`unless`
(and `elsif`), while parse.y rejects it:

    if a if b then end      # parse.y: SyntaxError, prism: accepted
    unless a unless b then end

The `if`/`unless`/`elsif` predicate was parsed with a floor of
`PM_BINDING_POWER_MODIFIER`, which is inclusive of the modifier
conditionals (`if`/`unless`/`while`/`until`, left binding power
`PM_BINDING_POWER_MODIFIER`), so they were absorbed into the predicate.
`while`/`until` predicates already use `PM_BINDING_POWER_COMPOSITION` and
reject these correctly.

Raise the floor to `PM_BINDING_POWER_MODIFIER + 1` so the predicate still
absorbs `and`/`or` (and tighter operators) but excludes the modifier
conditionals, matching parse.y and aligning `if`/`unless` with
`while`/`until`.

ruby/prism@0fa8d8f8d8
…dicate floor

The `+ 1` form is meant to express associativity, not a binding-power
floor; use the named constant instead, matching while/until conditions.

ruby/prism@53d1ee76c6
`gc_prof_mark_timer_start` and `gc_prof_mark_timer_stop` include DTrace
hooks for the `MARK_BEGIN` and `MARK_END` events, respectively.
Previously, those probes are only triggered in `gc_marks`.  However,
`gc_marks_continue` and `gc_rest` also contain marking activities, but
are not captured by the probes.

We move the invocation of `gc_prof_mark_timer_start` and
`gc_prof_mark_timer_stop` into `gc_marking_enter` and `gc_marking_exit`
to ensure all marking activities are captured by the probes.
Fold arithmetic identity operations such as `x + 0`, `x - 0`, `x * 1`, `x / 1` in `fold_constants`.
When VM state is corrupted enough, we can call abort() from the SIGABRT
handler. Previously, we would spam until the stack is full:

    ABRT received in SEGV handler
    SEGV received in ABRT handler
    ABRT received in SEGV handler
    ABRT received in ABRT handler
    ABRT received in ABRT handler
    ABRT received in ABRT handler
    [...]

We've seen this on CI:
https://github.com/ruby/ruby/actions/runs/26591192708/job/78350130653

To test this situation locally, temporarily patch in a call to abort()
in rb_bug_for_fatal_signal() then use `Process.kill(:ABRT, Process.pid)`.

Fix by restoring the default signal handler before aborting.
The completeness check skips the download once every expected file exists,
but File.write truncates before writing, so an install interrupted partway
leaves a half-written file in place. A later run would see it, treat the
vendor tree as complete, and load a truncated source.

Download each file to a sibling .tmp path and rename it into place. The
rename stays within the vendor tree, so it is atomic and a target file is
only ever observed fully written.

ruby/rubygems@a2f265f579

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eregon and others added 27 commits June 6, 2026 12:53
It already implictly is on recent rubies because it has no
mark function, but might as well make it explicit.

ruby/json@8d7f975b01
Suppres it for now because JRuby version does not support it yet.

ruby/json@8144d4cb34
The legacy dependency API exposes no per-version publish dates, so the
cooldown filter has nothing to compare against and silently does
nothing. Document that limitation as a regression guard rather than a
surprise.

ruby/rubygems@ce952c1b9f

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A single converge can hold several rubygems sources, each keyed by its
own remotes. A partial update re-converges the still-locked sources, the
path that used to drop cooldown, so lock in that the cooldown stays
attached to the source that declared it for both a top-level source and
a gem-block source.

ruby/rubygems@2fcb2d70cc

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
bundle add re-resolves the Gemfile against the existing lockfile via the
injector, which converges sources the same way install and update do. A
gem added there must inherit the source's cooldown instead of grabbing
an in-window release.

ruby/rubygems@cfed95d1dd

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This is the exact path from the original report: regenerating the
lockfile must not advance a gem into the cooldown window. Assert the
written lockfile to guard the lock-only flow that install and update
specs do not touch.

ruby/rubygems@293d7c7dc3

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A frozen install reads the lockfile rather than resolving, so cooldown
never runs. Document that a version locked inside the window still
installs, so the bypass stays intentional.

ruby/rubygems@bfc099bc26

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A mirror rewrites the fetch URI while cooldown stays keyed by the URI
declared in the Gemfile. Confirm the redirect to the serving mirror does
not lose the cooldown.

ruby/rubygems@ea2bb164f1

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stamping each solo_gem with its own Time.now.utc lets the two dates
drift apart and matches neither the surrounding before block. Snapshot
the time once so the cooldown window stays stable as thresholds tighten.

ruby/rubygems@69c6d876d4

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Although a crash could occur anywhere, one of the most common symptoms
we see from getting a reference to a garbage collected object is
crashing while attempting to call a method on it.

These crashes usually occur when trying to perform an rb_id_table_lookup
in the "class" cc_tbl, where the class is usually another garbage
collected object, because the freelist is stored in the class pointer.

This commit aims to have a better error message in this case, in hopes
of having a better grouping of errors and to give a better hint to those
investigating and triaging crashes.
@jhawthorn jhawthorn force-pushed the call_t_none_assertion branch from d46b726 to 9f5fc98 Compare June 8, 2026 17:36
@jhawthorn jhawthorn closed this Jun 8, 2026
@jhawthorn jhawthorn deleted the call_t_none_assertion branch June 8, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.